-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Enhancement] Add schema validation and placeholders to index mappings #3240
[Enhancement] Add schema validation and placeholders to index mappings #3240
Conversation
…ad of string constants Signed-off-by: Pavan Yekbote <[email protected]>
Signed-off-by: Pavan Yekbote <[email protected]>
Signed-off-by: Pavan Yekbote <[email protected]>
Signed-off-by: Pavan Yekbote <[email protected]>
Signed-off-by: Pavan Yekbote <[email protected]>
Signed-off-by: Pavan Yekbote <[email protected]>
Signed-off-by: Pavan Yekbote <[email protected]>
Signed-off-by: Pavan Yekbote <[email protected]>
Signed-off-by: Pavan Yekbote <[email protected]>
…ums rather than use their own mappings Signed-off-by: Pavan Yekbote <[email protected]>
Signed-off-by: Pavan Yekbote <[email protected]>
…holders Signed-off-by: Pavan Yekbote <[email protected]>
…h_index_mappings_from_files
Signed-off-by: Pavan Yekbote <[email protected]>
Signed-off-by: Pavan Yekbote <[email protected]>
Signed-off-by: Pavan Yekbote <[email protected]>
Signed-off-by: Pavan Yekbote <[email protected]>
Signed-off-by: Pavan Yekbote <[email protected]>
Signed-off-by: Pavan Yekbote <[email protected]>
Signed-off-by: Pavan Yekbote <[email protected]>
…cter issue Signed-off-by: Pavan Yekbote <[email protected]>
Signed-off-by: Pavan Yekbote <[email protected]>
Signed-off-by: Pavan Yekbote <[email protected]>
Signed-off-by: Pavan Yekbote <[email protected]>
Signed-off-by: Pavan Yekbote <[email protected]>
Signed-off-by: Pavan Yekbote <[email protected]>
Signed-off-by: Pavan Yekbote <[email protected]>
Signed-off-by: Pavan Yekbote <[email protected]>
Signed-off-by: Pavan Yekbote <[email protected]>
Signed-off-by: Pavan Yekbote <[email protected]>
public static String getMappingFromFile(String path) throws IOException { | ||
URL url = IndexUtils.class.getClassLoader().getResource(path); | ||
if (url == null) { | ||
throw new IOException("Resource not found: " + path); | ||
} | ||
|
||
String mapping = Resources.toString(url, Charsets.UTF_8).trim(); | ||
if (mapping.isEmpty() || !StringUtils.isJson(mapping)) { | ||
throw new IllegalArgumentException("Invalid or non-JSON mapping at: " + path); | ||
if (mapping.isEmpty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to have this check here considering we are going to check mapping.isBlank()
in the replacePlaceholders
method. Seems like redundant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i added this check here since this method can be used on its own, incase used in other places, i believe it is safer to have this additional check
} | ||
|
||
String placeholderMapping = Resources.toString(url, Charsets.UTF_8); | ||
mapping = mapping.replace(placeholder.getKey(), placeholderMapping); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we are creating string in every replace. May be we could use a StringBuilder to have in-place replacement?
May be something like this:
public static String replacePlaceholders(String mapping) throws IOException {
if (mapping == null || mapping.isBlank()) {
throw new IllegalArgumentException("Mapping cannot be null or empty");
}
// Preload resources into memory to avoid redundant I/O
Map<String, String> loadedPlaceholders = new HashMap<>();
for (Map.Entry<String, String> placeholder : MAPPING_PLACEHOLDERS.entrySet()) {
URL url = IndexUtils.class.getClassLoader().getResource(placeholder.getValue());
if (url == null) {
throw new IOException("Resource not found: " + placeholder.getValue());
}
// Load and cache the content
loadedPlaceholders.put(placeholder.getKey(), Resources.toString(url, Charsets.UTF_8));
}
// Use StringBuilder for efficient in-place replacements
StringBuilder result = new StringBuilder(mapping);
for (Map.Entry<String, String> entry : loadedPlaceholders.entrySet()) {
String placeholder = entry.getKey();
String replacement = entry.getValue();
// Replace all occurrences of the placeholder
int index;
while ((index = result.indexOf(placeholder)) != -1) {
result.replace(index, index + placeholder.length(), replacement);
}
}
return result.toString();
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, thanks for the suggestion, saves on the I/O and is more efficient!
@@ -336,4 +347,25 @@ public static JsonObject getJsonObjectFromString(String jsonString) { | |||
return JsonParser.parseString(jsonString).getAsJsonObject(); | |||
} | |||
|
|||
public static void validateSchema(String schemaString, String instanceString) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add try catch block to catch any exceptions like:
catch (JsonProcessingException e) {
throw new IllegalArgumentException("Invalid JSON format: " + e.getMessage(), e);
} catch (Exception e) {
throw new OpenSearchParseException("Schema validation failed: " + e.getMessage(), e);
}
|
||
// Validate JSON node against the schema | ||
Set<ValidationMessage> errors = schema.validate(jsonNode); | ||
if (!errors.isEmpty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about this:
if (!errors.isEmpty()) {
String errorMessage = errors.stream()
.map(ValidationMessage::getMessage)
.collect(Collectors.joining(", "));
throw new OpenSearchParseException(
"Validation failed: " + errorMessage +
" for instance: " + instanceString +
" with schema: " + schemaString
);
}
@@ -1,6 +1,6 @@ | |||
{ | |||
"_meta": { | |||
"schema_version": "1" | |||
"schema_version": 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually, I see that JSON files in the repository are named using snake_case. Let's follow the same convention.
…tion handling Signed-off-by: Pavan Yekbote <[email protected]>
@dhrubo-os addressed review comments, please review! |
- "_meta_.schema_version" provided type is integer | ||
|
||
Note: Validation can be made more strict if a specific schema is defined for each index. | ||
* Checks if mapping is a valid json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Nit] if you want to be fancy :D
/**
* Checks if the provided mapping is a valid JSON and validates it against a schema.
*
* <p>The schema is located at <code>mappings/schema.json</code> and enforces the following validations:</p>
*
* <ul>
* <li>Mandatory fields:
* <ul>
* <li><code>_meta</code></li>
* <li><code>_meta.schema_version</code></li>
* <li><code>properties</code></li>
* </ul>
* </li>
* <li>No additional fields are allowed at the root level.</li>
* <li>No additional fields are allowed in the <code>_meta</code> object.</li>
* <li><code>properties</code> must be an object type.</li>
* <li><code>_meta</code> must be an object type.</li>
* <li><code>_meta.schema_version</code> must be an integer.</li>
* </ul>
*
* <p><strong>Note:</strong> Validation can be made stricter if a specific schema is defined for each index.</p>
*/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that looks neat, ive added it
Signed-off-by: Pavan Yekbote <[email protected]>
#3240) * feat(index mappings): fetch mappings and version from json file instead of string constants Signed-off-by: Pavan Yekbote <[email protected]> * refactor: changing exception being thrown Signed-off-by: Pavan Yekbote <[email protected]> * chore: remove unused file Signed-off-by: Pavan Yekbote <[email protected]> * chore: fix typo in comment Signed-off-by: Pavan Yekbote <[email protected]> * chore: adding new line at the end of files Signed-off-by: Pavan Yekbote <[email protected]> * feat: add test cases Signed-off-by: Pavan Yekbote <[email protected]> * fix: remove test code Signed-off-by: Pavan Yekbote <[email protected]> * fix(test): in main the versions were not updated appropriately Signed-off-by: Pavan Yekbote <[email protected]> * refactor: move mapping templates under common module Signed-off-by: Pavan Yekbote <[email protected]> * refactor: ensure that conversationindexconstants reference mlindex enums rather than use their own mappings Signed-off-by: Pavan Yekbote <[email protected]> * refactor: update comment Signed-off-by: Pavan Yekbote <[email protected]> * feat: add enhancements to validate index schema and allow using placeholders Signed-off-by: Pavan Yekbote <[email protected]> * refactor: modifying comment Signed-off-by: Pavan Yekbote <[email protected]> * test: adding testcase for MLIndex to catch failures before runtime Signed-off-by: Pavan Yekbote <[email protected]> * refactor: rename dir from mappings to index-mappings Signed-off-by: Pavan Yekbote <[email protected]> * fix: add null checks Signed-off-by: Pavan Yekbote <[email protected]> * fix: modify mappin paths for placeholders Signed-off-by: Pavan Yekbote <[email protected]> * fix: adding dependencies for testing Signed-off-by: Pavan Yekbote <[email protected]> * fix(test): compare json object rather than strings to avoid eol character issue Signed-off-by: Pavan Yekbote <[email protected]> * refactor: combine if statements into single check Signed-off-by: Pavan Yekbote <[email protected]> * refactoring: null handling + clean code Signed-off-by: Pavan Yekbote <[email protected]> * spotless apply Signed-off-by: Pavan Yekbote <[email protected]> * tests: adding more UT Signed-off-by: Pavan Yekbote <[email protected]> * fix: dependencies to handle jarhell Signed-off-by: Pavan Yekbote <[email protected]> * spotless apply Signed-off-by: Pavan Yekbote <[email protected]> * refactor: add header and use single instance of mapper Signed-off-by: Pavan Yekbote <[email protected]> * fixed: doc syntax Signed-off-by: Pavan Yekbote <[email protected]> * refactor: renamed files, efficient loading of resources, better exception handling Signed-off-by: Pavan Yekbote <[email protected]> * refactor: cleaner comment Signed-off-by: Pavan Yekbote <[email protected]> --------- Signed-off-by: Pavan Yekbote <[email protected]> (cherry picked from commit 374cfd5)
opensearch-project#3240) * feat(index mappings): fetch mappings and version from json file instead of string constants Signed-off-by: Pavan Yekbote <[email protected]> * refactor: changing exception being thrown Signed-off-by: Pavan Yekbote <[email protected]> * chore: remove unused file Signed-off-by: Pavan Yekbote <[email protected]> * chore: fix typo in comment Signed-off-by: Pavan Yekbote <[email protected]> * chore: adding new line at the end of files Signed-off-by: Pavan Yekbote <[email protected]> * feat: add test cases Signed-off-by: Pavan Yekbote <[email protected]> * fix: remove test code Signed-off-by: Pavan Yekbote <[email protected]> * fix(test): in main the versions were not updated appropriately Signed-off-by: Pavan Yekbote <[email protected]> * refactor: move mapping templates under common module Signed-off-by: Pavan Yekbote <[email protected]> * refactor: ensure that conversationindexconstants reference mlindex enums rather than use their own mappings Signed-off-by: Pavan Yekbote <[email protected]> * refactor: update comment Signed-off-by: Pavan Yekbote <[email protected]> * feat: add enhancements to validate index schema and allow using placeholders Signed-off-by: Pavan Yekbote <[email protected]> * refactor: modifying comment Signed-off-by: Pavan Yekbote <[email protected]> * test: adding testcase for MLIndex to catch failures before runtime Signed-off-by: Pavan Yekbote <[email protected]> * refactor: rename dir from mappings to index-mappings Signed-off-by: Pavan Yekbote <[email protected]> * fix: add null checks Signed-off-by: Pavan Yekbote <[email protected]> * fix: modify mappin paths for placeholders Signed-off-by: Pavan Yekbote <[email protected]> * fix: adding dependencies for testing Signed-off-by: Pavan Yekbote <[email protected]> * fix(test): compare json object rather than strings to avoid eol character issue Signed-off-by: Pavan Yekbote <[email protected]> * refactor: combine if statements into single check Signed-off-by: Pavan Yekbote <[email protected]> * refactoring: null handling + clean code Signed-off-by: Pavan Yekbote <[email protected]> * spotless apply Signed-off-by: Pavan Yekbote <[email protected]> * tests: adding more UT Signed-off-by: Pavan Yekbote <[email protected]> * fix: dependencies to handle jarhell Signed-off-by: Pavan Yekbote <[email protected]> * spotless apply Signed-off-by: Pavan Yekbote <[email protected]> * refactor: add header and use single instance of mapper Signed-off-by: Pavan Yekbote <[email protected]> * fixed: doc syntax Signed-off-by: Pavan Yekbote <[email protected]> * refactor: renamed files, efficient loading of resources, better exception handling Signed-off-by: Pavan Yekbote <[email protected]> * refactor: cleaner comment Signed-off-by: Pavan Yekbote <[email protected]> --------- Signed-off-by: Pavan Yekbote <[email protected]>
Description
Added schema validation and placeholders to index mappings
Related Issues
Resolves #2951
Check List
--signoff
.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.